Frameshift alignment: statistics and post-genomic applications

نویسندگان

  • Sergey Sheetlin
  • Yonil Park
  • Martin C. Frith
  • John L. Spouge
چکیده

MOTIVATION The alignment of DNA sequences to proteins, allowing for frameshifts, is a classic method in sequence analysis. It can help identify pseudogenes (which accumulate mutations), analyze raw DNA and RNA sequence data (which may have frameshift sequencing errors), investigate ribosomal frameshifts, etc. Often, however, only ad hoc approximations or simulations are available to provide the statistical significance of a frameshift alignment score. RESULTS We describe a method to estimate statistical significance of frameshift alignments, similar to classic BLAST statistics. (BLAST presently does not permit its alignments to include frameshifts.) We also illustrate the continuing usefulness of frameshift alignment with two 'post-genomic' applications: (i) when finding pseudogenes within the human genome, frameshift alignments show that most anciently conserved non-coding human elements are recent pseudogenes with conserved ancestral genes; and (ii) when analyzing metagenomic DNA reads from polluted soil, frameshift alignments show that most alignable metagenomic reads contain frameshifts, suggesting that metagenomic analysis needs to use frameshift alignment to derive accurate results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nutrigenomics and its Applications in Animal Science

Nutrigenomics applies genomic technologies to study how nutrients affect expression of genes. With the advent of the post genomic era and with the use of functional genomic tools, the new strategies for evaluating the effects of nutrition on production efficiency and nutrient utilization are becoming available. Nutrigenomics plays an efficient role in various fields of animal health like nutrit...

متن کامل

Frameshift Mutations (Deletion at Codon 1309 and Codon 849) in the APC Gene in Iranian FAP Patients: a Case Series and Review Of The literature

Familial adenomatous polyposis (FAP) is responsible for <1% of colorectal cancer (CRC) cases and is inherited as an autosomal dominant trait. Patients generally present hundreds to thousands of adenomas and develop colorectal cancer by age 35- 40 if left untreated. Here we report four patients with germline frameshift mutation (small deletion) at exon 15 of adenomatous polyposis coli (APC) tumo...

متن کامل

Alignment of protein-coding sequences with frameshift extension penalties

We introduce an algorithm for the alignment of proteincoding sequences accounting for frameshifts. The main specificity of this algorithm as compared to previously published protein-coding sequence alignment methods is the introduction of a penalty cost for frameshift extensions. Previous algorithms have only used constant frameshift penalties. This is similar to the use of scoring schemes with...

متن کامل

Post-processing long pairwise alignments

MOTIVATION The local alignment problem for two sequences requires determining similar regions, one from each sequence, and aligning those regions. For alignments computed by dynamic programming, current approaches for selecting similar regions may have potential flaws. For instance, the criterion of Smith and Waterman can lead to inclusion of an arbitrarily poor internal segment. Other approach...

متن کامل

The post-genomic era of biological network alignment

Biological network alignment aims to find regions of topological and functional (dis)similarities between molecular networks of different species. Then, network alignment can guide the transfer of biological knowledge from well-studied model species to less well-studied species between conserved (aligned) network regions, thus complementing valuable insights that have already been provided by g...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 30 24  شماره 

صفحات  -

تاریخ انتشار 2014